United States

UNITED STATES

Row

Box 1

7,793,251

Percent of Population Infected

Summary for April 12, 2020

  • Estimated Cumulative: 7.8 M (at 1% IFR)
  • Reported Cumulative: 659 K
  • Cumulative IFR/CFR: 11.8
  • Estimated Daily: 204 K (at 1% IFR)
  • Reported Daily: 32.0 K
  • Daily IFR/CFR: 6.4

Row

Cumulative (Total) Infections

Log scale

Row

Daily Infections

Log scale

Row

Time of First Infection

Fatality Count

VA

VIRGINIA

Row

Box 1

40,445

Percent of Population Infected

Summary for April 12, 2020

  • Estimated Cumulative: 40.4 K (at 1% IFR)
  • Reported Cumulative: 5.3 K
  • Cumulative IFR/CFR: 7.7
  • Estimated Daily: 1.1 K (at 1% IFR)
  • Reported Daily: 197
  • Daily IFR/CFR: 5.8

Row

Cumulative (Total) Infections

Log scale

Row

Daily Infections

Log scale

Row

Time of First Infection

Fatality Count

NY

NEW YORK

Row

Box 1

2,594,101

Percent of Population Infected

Summary for April 12, 2020

  • Estimated Cumulative: 2.6 M (at 1% IFR)
  • Reported Cumulative: 189 K
  • Cumulative IFR/CFR: 13.7
  • Estimated Daily: 72.7 K (at 1% IFR)
  • Reported Daily: 8.2 K
  • Daily IFR/CFR: 8.8

Row

Cumulative (Total) Infections

Log scale

Row

Daily Infections

Log scale

Row

Time of First Infection

Fatality Count

NYC

NEW YORK CITY

Row

Box 1

1,827,807

Percent of Population Infected

Summary for April 12, 2020

  • Estimated Cumulative: 1.8 M (at 1% IFR)
  • Reported Cumulative: 103 K
  • Cumulative IFR/CFR: 17.7
  • Estimated Daily: 47.8 K (at 1% IFR)
  • Reported Daily: 4.9 K
  • Daily IFR/CFR: 9.8

Row

Cumulative (Total) Infections

Log scale

Row

Daily Infections

Log scale

Row

Time of First Infection

Fatality Count

Model Details

Row

Estimated Infection Counts from Fatality Data

The actual infection counts are estimated from fatality data instead of the more biased (or even meaningless) positive test data. The only input to the model is the time to death distribution - the distribution of the time between infection and death. The approach is simple; using the time to death distribution in reverse, each death is randomly assigned to an infection day. The number of people which were estimated to be infected on that date (adjusted for censoring) multiplied by the infection fatality rate (IFR) is an estimate of the true infection count on that day. Adjusting for censoring means that a correction is made to account for people that were infected on previous day, but not enough time has elapsed to know if they will survive.

Row


Based on [REFs], I am using a shifted negative binomial (mean=23.9, size=7.9) distribution for time to death (given death from COVID-19):

PMF

CDF

Survival Function

Equation

Equation here

Row

How it works

The number of deaths on each day are randomly distributed to previous days based on the probability that infection took place on that day. The figure below illustrates how this works. For example, in the United States there were 1996 deaths recorded on Apr 12. This number shows up on the top right in the row corresponding to Apr 12. These 1996 deaths are then distributed to previous days according to the time-to-death distribution. For example, we can see from the PMF plot above that about 4.46% of the deaths of any day will be assigned 20 days in the past. Thus, we would expect about 89 deaths to be assigned to Mar 23 which is 20 days before Apr 12. The figure shows results for one simulation which assigned 72 72 deaths to Mar 23. This indicates that 72 of the people who died on Apr 12 were estimated to be infected on Mar 23.

Likewise for Apr 11, 102 of the 2403 people who died were estimated to have been infected on Mar 23. Adding all the numbers in the Mar 23 column, we find that a total of 791 of the people who have died were estimated to have been infected on Mar 23.


Before we can use this number to estimate the infection fatality rate (IFR) we have to make an adjustment to account for the people that were infected on Mar 23 but not enough time has elapsed to know if they will survive. Referring to the CDF (cumulative distribution function) plot above, we expect 39.9% of the deaths from COVID-19 to occur within in the first 20 days from infection. This implies that the 791 imputed infections only represents about 39.9% of the infections that we can expect to be attributed to Mar 23 once more time elapses. Dividing 791 by 39.9% to adjust for future deaths, we get an estimate of 1980.47 for the number of people infected on Mar 23 who will unfortunately succumb to COVID-19.

Taking that number and dividing by the estimated IFR provides an estimate of the total number of people infected on Mar 23:
fatality rate 3.0% 2.0% 1.0% 0.5%
number infected 66,016 99,024 198,047 396,094

Repeating this procedure for every day will give the estimated infection rates over time. Estimated infection counts at days close to the current day will have a high level of uncertainty because there is limited fataility data available. To help control the erratic behavior, the estimates are adjust slightly to encourage the log of the estimated infection counts to be linear. This has no pratical effect on the estimates more than about 10 days from the current date.

The estimated infection count plots on the main page also include uncertainty bands (confidence intervals) formed by repeating this procedure 1000 times and shading in the 95% pointwise intervals (i.e., using the .025 and .975 quantiles).

Data Sources

TODO

About

TODO Details of the team